-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Use non-blocking device side pointer mode in CUBLAS, with fallbacks #2616
base: master
Are you sure you want to change the base?
Conversation
I can also add some more |
Sample speedup: julia> using CUDA, CUDA.CUBLAS, LinearAlgebra;
julia> n = Int(2^26);
julia> X = CUDA.rand(Float64, n);
julia> res = CuRef{Float64}(0.0);
# do some precompilation runs first
julia> @time CUBLAS.nrm2(n, X, res);
0.000104 seconds (18 allocations: 288 bytes)
julia> @time CUBLAS.nrm2(n, X);
0.001564 seconds (73 allocations: 3.094 KiB)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: cc7e01f | Previous: 4bec614 | Ratio |
---|---|---|---|
latency/precompile |
45488869455 ns |
45396234276 ns |
1.00 |
latency/ttfp |
6280704627 ns |
6416277525.5 ns |
0.98 |
latency/import |
2962175349 ns |
3047951471 ns |
0.97 |
integration/volumerhs |
9569307 ns |
9572210 ns |
1.00 |
integration/byval/slices=1 |
146908 ns |
146689 ns |
1.00 |
integration/byval/slices=3 |
425424 ns |
424769 ns |
1.00 |
integration/byval/reference |
145136 ns |
144911 ns |
1.00 |
integration/byval/slices=2 |
285958 ns |
285674 ns |
1.00 |
integration/cudadevrt |
103374 ns |
103228 ns |
1.00 |
kernel/indexing |
14165.5 ns |
13962 ns |
1.01 |
kernel/indexing_checked |
14847.5 ns |
14556 ns |
1.02 |
kernel/occupancy |
634.1656804733727 ns |
693.384105960265 ns |
0.91 |
kernel/launch |
2063.9 ns |
2164.166666666667 ns |
0.95 |
kernel/rand |
14540.5 ns |
14418 ns |
1.01 |
array/reverse/1d |
20127 ns |
19581 ns |
1.03 |
array/reverse/2d |
24975 ns |
24389 ns |
1.02 |
array/reverse/1d_inplace |
11254 ns |
10606.666666666666 ns |
1.06 |
array/reverse/2d_inplace |
13089 ns |
11144 ns |
1.17 |
array/copy |
20876 ns |
20336 ns |
1.03 |
array/iteration/findall/int |
156012 ns |
156856.5 ns |
0.99 |
array/iteration/findall/bool |
134582 ns |
135569 ns |
0.99 |
array/iteration/findfirst/int |
153291 ns |
153474.5 ns |
1.00 |
array/iteration/findfirst/bool |
152343 ns |
152950 ns |
1.00 |
array/iteration/scalar |
59300 ns |
60882 ns |
0.97 |
array/iteration/logical |
202346.5 ns |
202672 ns |
1.00 |
array/iteration/findmin/1d |
37654 ns |
37856 ns |
0.99 |
array/iteration/findmin/2d |
93622 ns |
93737 ns |
1.00 |
array/reductions/reduce/1d |
36564.5 ns |
38166 ns |
0.96 |
array/reductions/reduce/2d |
51306 ns |
51122 ns |
1.00 |
array/reductions/mapreduce/1d |
33306 ns |
31151.5 ns |
1.07 |
array/reductions/mapreduce/2d |
46557.5 ns |
49629.5 ns |
0.94 |
array/broadcast |
20821 ns |
21225 ns |
0.98 |
array/copyto!/gpu_to_gpu |
13362 ns |
13324 ns |
1.00 |
array/copyto!/cpu_to_gpu |
208863 ns |
208348.5 ns |
1.00 |
array/copyto!/gpu_to_cpu |
241484 ns |
241560 ns |
1.00 |
array/accumulate/1d |
108701 ns |
108467 ns |
1.00 |
array/accumulate/2d |
80028 ns |
79962 ns |
1.00 |
array/construct |
1279.6 ns |
1342.7 ns |
0.95 |
array/random/randn/Float32 |
43235 ns |
43560.5 ns |
0.99 |
array/random/randn!/Float32 |
26476 ns |
26195 ns |
1.01 |
array/random/rand!/Int64 |
27068 ns |
27079 ns |
1.00 |
array/random/rand!/Float32 |
8805 ns |
8700 ns |
1.01 |
array/random/rand/Int64 |
30244 ns |
29827 ns |
1.01 |
array/random/rand/Float32 |
13382 ns |
12930 ns |
1.03 |
array/permutedims/4d |
61027 ns |
67316 ns |
0.91 |
array/permutedims/2d |
55247 ns |
56600 ns |
0.98 |
array/permutedims/3d |
56104 ns |
59248 ns |
0.95 |
array/sorting/1d |
2775155 ns |
2764861 ns |
1.00 |
array/sorting/by |
3365800 ns |
3352588 ns |
1.00 |
array/sorting/2d |
1083397 ns |
1080760 ns |
1.00 |
cuda/synchronization/stream/auto |
1022.4 ns |
1111.7 ns |
0.92 |
cuda/synchronization/stream/nonblocking |
6309.4 ns |
6387.8 ns |
0.99 |
cuda/synchronization/stream/blocking |
800.8117647058823 ns |
831.395061728395 ns |
0.96 |
cuda/synchronization/context/auto |
1171.5 ns |
1212.1 ns |
0.97 |
cuda/synchronization/context/nonblocking |
6492.2 ns |
6586.8 ns |
0.99 |
cuda/synchronization/context/blocking |
912.7941176470588 ns |
916.775 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Yeah, should one of us open an issue?
…On Sat, Jan 11, 2025 at 2:48 AM Tim Besard ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In lib/cublas/wrappers.jl
<#2616 (comment)>:
> function scal!(n::Integer, alpha::Number, x::StridedCuVecOrDenseMat{Float16})
- α = convert(Float32, alpha)
- cublasScalEx(handle(), n, Ref{Float32}(α), Float32, x, Float16, stride(x, 1), Float32)
+ α = CuRef{Float32}( convert(Float32, alpha) )
We should improve CuRef so that it can be constructed identically to Ref.
Ref{T}(x) doing an implicit convert is pretty convenient.
—
Reply to this email directly, view it on GitHub
<#2616 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJY7VKNAPMMZTTKAF2YT2KDEFVAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBUGU4DSOBRGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Is the test failure something I've done? Seems GPUArrays related |
a0829fa
to
5d52d10
Compare
OK, I think this is ready for review! |
I am not qualified to review, but certainly interested in the outcome. Will the non-blocking methods only accept |
For now only CuRef but these are easy to create (it’s exported by CUDA.jl).
I think one can also create them without a copy from a regular CuArray?
…On Thu, Jan 16, 2025 at 3:41 PM Jutho ***@***.***> wrote:
I am not qualified to review, but certainly interested in the outcome.
Will the non-blocking methods only accept CuRef objects for the scalar
input or output quantities, or also zero-dimensional arrays (i.e.
CuArray{T,0})?
—
Reply to this email directly, view it on GitHub
<#2616 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJYYFBVIOILWK4G4PORD2LAKPLAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWHA2DSMBXG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
You can create a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should also improve CuRef
to initialize its memory by calling fill
instead of memcpy
: When calling memcpy
, the copy likely won't be truly asynchronous (that would require pinned memory). But if we call fill
, which should be possible for most scalars, the argument is passed by value and I think the call will complete asynchronously.
Something to investigate!
lib/cublas/wrappers.jl
Outdated
α = convert(T, alpha) | ||
gpu_α = CuRef{T}(α) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converts can go, CuRef
does that for you:
julia> CuRef{Float32}(1)
CUDA.CuRefArray{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}(Float32[1.0], 1)
α = convert(T, alpha) | ||
gpu_α = CuRef{T}(α) | ||
scal!(n, gpu_α, x) | ||
synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the synchronization? The only way to see the changes by this call is to fetch memory, which is a synchronizing operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scal!
and other functions which don't return a scalar result, I added this to keep the previous behaviour (so that the entire call is synchronous). I'll remove the sync for things like nrm2!
that return a scalar which is copied back anyway.
|
829083e
to
fd59678
Compare
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cublas/wrappers.jl b/lib/cublas/wrappers.jl
index 24f414af0..ea213ff66 100644
--- a/lib/cublas/wrappers.jl
+++ b/lib/cublas/wrappers.jl
@@ -115,8 +115,9 @@ for (fname, fname_64, elty) in ((:cublasDscal_v2, :cublasDscal_v2_64, :Float64),
(:cublasCscal_v2, :cublasCscal_v2_64, :ComplexF32))
@eval begin
function scal!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVecOrDenseMat{$elty}) where {M <: AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVecOrDenseMat{$elty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, x, stride(x, 1))
else
@@ -147,8 +148,9 @@ for (fname, fname_64, elty, celty) in ((:cublasCsscal_v2, :cublasCsscal_v2_64, :
(:cublasZdscal_v2, :cublasZdscal_v2_64, :Float64, :ComplexF64))
@eval begin
function scal!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVecOrDenseMat{$celty}) where {M<:AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVecOrDenseMat{$celty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, x, stride(x, 1))
else
@@ -190,9 +192,9 @@ for (jname, fname, fname_64, elty) in ((:dot, :cublasDdot_v2, :cublasDdot_v2_64,
@eval begin
function $jname(n::Integer,
x::StridedCuVecOrDenseMat{$elty},
- y::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{$elty, CuVector{$elty, M}},
- ) where {M<:AbstractMemory}
+ y::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{$elty, CuVector{$elty, M}},
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, x, stride(x, 1), y, stride(y, 1), result)
else
@@ -236,7 +238,7 @@ function dotu(
return result[]
end
-function dot(n::Integer, x::StridedCuVecOrDenseMat{Float16}, y::StridedCuVecOrDenseMat{Float16}, result::CuRefArray{Float16, CuVector{Float16, M}}) where {M<:AbstractMemory}
+function dot(n::Integer, x::StridedCuVecOrDenseMat{Float16}, y::StridedCuVecOrDenseMat{Float16}, result::CuRefArray{Float16, CuVector{Float16, M}}) where {M <: AbstractMemory}
cublasDotEx(handle(), n, x, Float16, stride(x, 1), y, Float16, stride(y, 1), result, Float16, Float32)
return result
end
@@ -263,7 +265,7 @@ for (fname, fname_64, elty, ret_type) in ((:cublasDnrm2_v2, :cublasDnrm2_v2_64,
function nrm2(n::Integer,
X::StridedCuVecOrDenseMat{$elty},
result::CuRefArray{$ret_type, CuVector{$ret_type, M}},
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, X, stride(X, 1), result)
else
@@ -339,9 +341,10 @@ for (fname, fname_64, elty) in ((:cublasDaxpy_v2, :cublasDaxpy_v2_64, :Float64),
(:cublasCaxpy_v2, :cublasCaxpy_v2_64, :ComplexF32))
@eval begin
function axpy!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
dx::StridedCuVecOrDenseMat{$elty},
- dy::StridedCuVecOrDenseMat{$elty}) where {M <: AbstractMemory}
+ dy::StridedCuVecOrDenseMat{$elty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, dx, stride(dx, 1), dy, stride(dy, 1))
else
@@ -399,9 +402,9 @@ for (fname, fname_64, elty, cty, sty) in (
function rot!(n::Integer,
x::StridedCuVecOrDenseMat{$elty},
y::StridedCuVecOrDenseMat{$elty},
- c::CuRefArray{$cty, CuVector{$cty, M}},
- s::CuRefArray{$sty, CuVector{$sty, M}},
- ) where {M <: AbstractMemory}
+ c::CuRefArray{$cty, CuVector{$cty, M}},
+ s::CuRefArray{$sty, CuVector{$sty, M}},
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, x, stride(x, 1), y, stride(y, 1), c, s)
else
@@ -472,9 +475,9 @@ for (fname, fname_64, elty) in ((:cublasIdamax_v2, :cublasIdamax_v2_64, :Float64
(:cublasIcamax_v2, :cublasIcamax_v2_64, :ComplexF32))
@eval begin
function iamax(n::Integer,
- dx::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{Ti, CuVector{Ti, M}},
- ) where {Ti <: Integer, M <: AbstractMemory}
+ dx::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{Ti, CuVector{Ti, M}},
+ ) where {Ti <: Integer, M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, dx, stride(dx, 1), result)
else
@@ -493,9 +496,9 @@ for (fname, fname_64, elty) in ((:cublasIdamin_v2, :cublasIdamin_v2_64, :Float64
(:cublasIcamin_v2, :cublasIcamin_v2_64, :ComplexF32))
@eval begin
function iamin(n::Integer,
- dx::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{Ti, CuVector{Ti, M}},
- ) where {Ti <: Integer, M <: AbstractMemory}
+ dx::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{Ti, CuVector{Ti, M}},
+ ) where {Ti <: Integer, M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, dx, stride(dx, 1), result)
else
@@ -529,11 +532,12 @@ for (fname, fname_64, elty) in ((:cublasDgemv_v2, :cublasDgemv_v2_64, :Float64),
(:cublasCgemv_v2, :cublasCgemv_v2_64, :ComplexF32))
@eval begin
function gemv!(trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
# handle trans
m,n = size(A)
# check dimensions
@@ -558,10 +562,10 @@ function gemv!(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVe
synchronize()
return y
end
-function gemv(trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function gemv(trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M <: AbstractMemory}
return gemv!(trans, alpha, A, x, CuRef{T}(zero(T)), similar(x, size(A, (trans == 'N' ? 1 : 2))))
end
-function gemv(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where T
+function gemv(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T}
gemv!(trans, alpha, A, x, zero(T), similar(x, size(A, (trans == 'N' ? 1 : 2))))
end
# should this be async?
@@ -579,12 +583,12 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
)
@eval begin
function gemv_batched!(trans::Char,
- alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- A::Vector{<:StridedCuMatrix{$eltyin}},
- x::Vector{<:StridedCuVector{$eltyin}},
- beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- y::Vector{<:StridedCuVector{$eltyout}}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ A::Vector{<:StridedCuMatrix{$eltyin}},
+ x::Vector{<:StridedCuVector{$eltyin}},
+ beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ y::Vector{<:StridedCuVector{$eltyout}}
+ ) where {M <: AbstractMemory}
if length(A) != length(x) || length(A) != length(y)
throw(DimensionMismatch("Lengths of inputs must be the same"))
end
@@ -615,13 +619,13 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
y
end
function gemv_batched!(
- trans::Char,
- alpha::Number,
- A::Vector{<:StridedCuMatrix{$eltyin}},
- x::Vector{<:StridedCuVector{$eltyin}},
- beta::Number,
- y::Vector{<:StridedCuVector{$eltyout}}
- )
+ trans::Char,
+ alpha::Number,
+ A::Vector{<:StridedCuMatrix{$eltyin}},
+ x::Vector{<:StridedCuVector{$eltyin}},
+ beta::Number,
+ y::Vector{<:StridedCuVector{$eltyout}}
+ )
gpu_α = CuRef{$eltyconst}(alpha)
gpu_β = CuRef{$eltyconst}(beta)
y = gemv_batched!(trans, gpu_α, A, x, gpu_β, y)
@@ -641,12 +645,12 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
)
@eval begin
function gemv_strided_batched!(trans::Char,
- alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- A::AbstractArray{$eltyin, 3},
- x::AbstractArray{$eltyin, 2},
- beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- y::AbstractArray{$eltyout, 2}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ A::AbstractArray{$eltyin, 3},
+ x::AbstractArray{$eltyin, 2},
+ beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ y::AbstractArray{$eltyout, 2}
+ ) where {M <: AbstractMemory}
if size(A, 3) != size(x, 2) || size(A, 3) != size(y, 2)
throw(DimensionMismatch("Batch sizes must be equal for all inputs"))
end
@@ -671,13 +675,13 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
y
end
function gemv_strided_batched!(
- trans::Char,
- alpha::Number,
- A::AbstractArray{$eltyin, 3},
- x::AbstractArray{$eltyin, 2},
- beta::Number,
- y::AbstractArray{$eltyout, 2}
- )
+ trans::Char,
+ alpha::Number,
+ A::AbstractArray{$eltyin, 3},
+ x::AbstractArray{$eltyin, 2},
+ beta::Number,
+ y::AbstractArray{$eltyout, 2}
+ )
gpu_α = CuRef{$eltyconst}(alpha)
gpu_β = CuRef{$eltyconst}(beta)
y = gemv_strided_batched!(trans, gpu_α, A, x, gpu_β, y)
@@ -697,11 +701,12 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
m::Integer,
kl::Integer,
ku::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = size(A,2)
# check dimensions
length(x) == (trans == 'N' ? n : m) && length(y) == (trans == 'N' ? m : n) || throw(DimensionMismatch(""))
@@ -716,16 +721,17 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
end
y
end
- function gbmv!(trans::Char,
- m::Integer,
- kl::Integer,
- ku::Integer,
- alpha::Number,
- A::StridedCuMatrix{$elty},
- x::StridedCuVector{$elty},
- beta::Number,
- y::StridedCuVector{$elty}
- )
+ function gbmv!(
+ trans::Char,
+ m::Integer,
+ kl::Integer,
+ ku::Integer,
+ alpha::Number,
+ A::StridedCuMatrix{$elty},
+ x::StridedCuVector{$elty},
+ beta::Number,
+ y::StridedCuVector{$elty}
+ )
gpu_α = CuRef{$elty}(alpha)
gpu_β = CuRef{$elty}(beta)
@@ -735,8 +741,10 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
end
end
end
-function gbmv(trans::Char, m::Integer, kl::Integer, ku::Integer, alpha::CuRefArray{T, CuVector{T, M}},
- A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function gbmv(
+ trans::Char, m::Integer, kl::Integer, ku::Integer, alpha::CuRefArray{T, CuVector{T, M}},
+ A::StridedCuMatrix{T}, x::StridedCuVector{T}
+ ) where {T, M <: AbstractMemory}
# TODO: fix gbmv bug in julia
n = size(A, 2)
leny = trans == 'N' ? m : n
@@ -759,11 +767,12 @@ for (fname, fname_64, elty) in ((:cublasDspmv_v2, :cublasDspmv_v2_64, :Float64),
(:cublasSspmv_v2, :cublasSspmv_v2_64, :Float32))
@eval begin
function spmv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
AP::StridedCuVector{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = round(Int, (sqrt(8*length(AP))-1)/2)
if n != length(x) || n != length(y) throw(DimensionMismatch("")) end
incx = stride(x,1)
@@ -777,21 +786,24 @@ for (fname, fname_64, elty) in ((:cublasDspmv_v2, :cublasDspmv_v2_64, :Float64),
end
end
end
-function spmv!(uplo::Char,
- alpha::Number,
- AP::StridedCuVector{T},
- x::StridedCuVector{T},
- beta::Number,
- y::StridedCuVector{T}
- ) where {T}
+function spmv!(
+ uplo::Char,
+ alpha::Number,
+ AP::StridedCuVector{T},
+ x::StridedCuVector{T},
+ beta::Number,
+ y::StridedCuVector{T}
+ ) where {T}
gpu_α = CuRef{T}(alpha)
gpu_β = CuRef{T}(beta)
y = spmv!(uplo, gpu_α, AP, x, gpu_β, y)
synchronize()
return y
end
-function spmv(uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
- AP::StridedCuVector{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function spmv(
+ uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
+ AP::StridedCuVector{T}, x::StridedCuVector{T}
+ ) where {T, M <: AbstractMemory}
return spmv!(uplo, alpha, AP, x, CuRef{T}(zero(T)), similar(x))
end
function spmv(uplo::Char, alpha::Number,
@@ -810,11 +822,12 @@ for (fname, fname_64, elty) in ((:cublasDsymv_v2, :cublasDsymv_v2_64, :Float64),
# Note that the complex symv are not BLAS but auiliary functions in LAPACK
@eval begin
function symv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
if m != length(x) || m != length(y) throw(DimensionMismatch("")) end
@@ -847,7 +860,7 @@ end
function symv(
uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return symv!(uplo, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function symv(uplo::Char, alpha::Number,
@@ -864,11 +877,12 @@ for (fname, fname_64, elty) in ((:cublasZhemv_v2, :cublasZhemv_v2_64, :ComplexF6
(:cublasChemv_v2, :cublasChemv_v2_64, :ComplexF32))
@eval begin
function hemv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
# TODO: fix dimension check bug in julia
m, n = size(A)
if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
@@ -902,7 +916,7 @@ end
function hemv(
uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return hemv!(uplo, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function hemv(uplo::Char, alpha::Number, A::StridedCuMatrix{T},
@@ -922,11 +936,12 @@ for (fname, fname_64, elty) in ((:cublasDsbmv_v2, :cublasDsbmv_v2_64, :Float64),
@eval begin
function sbmv!(uplo::Char,
k::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
#if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
if !(1<=(1+k)<=n) throw(DimensionMismatch("Incorrect number of bands")) end
@@ -962,7 +977,7 @@ end
function sbmv(
uplo::Char, k::Integer, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return sbmv!(uplo, k, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function sbmv(uplo::Char, k::Integer, alpha::Number,
@@ -981,11 +996,12 @@ for (fname, fname_64, elty) in ((:cublasZhbmv_v2, :cublasZhbmv_v2_64, :ComplexF6
@eval begin
function hbmv!(uplo::Char,
k::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
if !(1<=(1+k)<=n) throw(DimensionMismatch("Incorrect number of bands")) end
if m < 1+k throw(DimensionMismatch("Array A has fewer than 1+k rows")) end
@@ -1020,7 +1036,7 @@ end
function hbmv(
uplo::Char, k::Integer, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return hbmv!(uplo, k, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function hbmv(uplo::Char, k::Integer, alpha::Number,
@@ -1168,10 +1184,11 @@ for (fname, fname_64, elty) in ((:cublasDger_v2, :cublasDger_v2_64, :Float64),
(:cublasCgerc_v2, :cublasCgerc_v2_64, :ComplexF32))
@eval begin
function ger!(
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
y::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == length(x) || throw(DimensionMismatch(""))
n == length(y) || throw(DimensionMismatch(""))
@@ -1204,9 +1221,10 @@ for (fname, fname_64, elty) in ((:cublasDspr_v2, :cublasDspr_v2_64, :Float64),
(:cublasSspr_v2, :cublasSspr_v2_64, :Float32))
@eval begin
function spr!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
- AP::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ AP::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = round(Int, (sqrt(8*length(AP))-1)/2)
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
incx = stride(x,1)
@@ -1238,9 +1256,10 @@ for (fname, fname_64, elty) in ((:cublasDsyr_v2, :cublasDsyr_v2_64, :Float64),
(:cublasCsyr_v2, :cublasCsyr_v2_64, :ComplexF32))
@eval begin
function syr!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1274,9 +1293,10 @@ for (fname, fname_64, elty, relty) in (
)
@eval begin
function her!(uplo::Char,
- alpha::CuRefArray{$relty, CuVector{$relty, M}},
+ alpha::CuRefArray{$relty, CuVector{$relty, M}},
x::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1308,11 +1328,11 @@ for (fname, fname_64, elty) in ((:cublasZher2_v2, :cublasZher2_v2_64, :ComplexF6
(:cublasCher2_v2, :cublasCher2_v2_64, :ComplexF32))
@eval begin
function her2!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVector{$elty},
- y::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVector{$elty},
+ y::StridedCuVector{$elty},
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1352,11 +1372,12 @@ for (fname, fname_64, elty) in ((:cublasDgemm_v2, :cublasDgemm_v2_64, :Float64),
@eval begin
function gemm!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuVecOrMat{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuVecOrMat{$elty}
+ ) where {M <: AbstractMemory}
m = size(A, transA == 'N' ? 1 : 2)
k = size(A, transA == 'N' ? 2 : 1)
n = size(B, transB == 'N' ? 2 : 1)
@@ -1393,7 +1414,7 @@ end
function gemm(
transA::Char, transB::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuVecOrMat{T}, B::StridedCuVecOrMat{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return gemm!(
transA, transB, alpha, A, B, CuRef(zero(T)),
similar(
@@ -1493,10 +1514,10 @@ function gemmExComputeType(TA, TB, TC, m, k, n)
end
function gemmEx!(transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::StridedCuVecOrMat),
@nospecialize(B::StridedCuVecOrMat),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::StridedCuVecOrMat);
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT)
m = size(A, transA == 'N' ? 1 : 2)
@@ -1551,10 +1572,10 @@ end
# TODO for device mode pointers
function gemmBatchedEx!(transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::Vector{<:StridedCuVecOrMat}),
@nospecialize(B::Vector{<:StridedCuVecOrMat}),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::Vector{<:StridedCuVecOrMat});
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT)
if length(A) != length(B) || length(A) != length(C)
@@ -1622,11 +1643,11 @@ function gemmBatchedEx!(
end
function gemmStridedBatchedEx!(
- transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ transA::Char, transB::Char,
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::AbstractArray{Ta, 3}),
@nospecialize(B::AbstractArray{Tb, 3}),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::AbstractArray{Tc, 3});
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT) where {Ta, Tb, Tc}
if size(A, 3) != size(B, 3) || size(A, 3) != size(C, 3)
@@ -1865,11 +1886,12 @@ for (fname, fname_64, elty) in ((:cublasDgemmBatched, :cublasDgemmBatched_64, :F
@eval begin
function gemm_batched!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::Vector{<:StridedCuMatrix{$elty}},
B::Vector{<:StridedCuMatrix{$elty}},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::Vector{<:StridedCuMatrix{$elty}}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::Vector{<:StridedCuMatrix{$elty}}
+ ) where {M <: AbstractMemory}
if length(A) != length(B) || length(A) != length(C)
throw(DimensionMismatch(""))
end
@@ -1948,11 +1970,12 @@ for (fname, fname_64, elty) in ((:cublasDgemmStridedBatched, :cublasDgemmStrided
@eval begin
function gemm_strided_batched!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::AbstractArray{$elty, 3}, # allow PermutedDimsArray
B::AbstractArray{$elty, 3},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::AbstractArray{$elty, 3}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::AbstractArray{$elty, 3}
+ ) where {M <: AbstractMemory}
m = size(A, transA == 'N' ? 1 : 2)
k = size(A, transA == 'N' ? 2 : 1)
n = size(B, transB == 'N' ? 2 : 1)
@@ -2031,11 +2054,12 @@ for (fname, fname_64, elty) in ((:cublasDsymm_v2, :cublasDsymm_v2_64, :Float64),
@eval begin
function symm!(side::Char,
uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
k, nA = size(A)
if k != nA throw(DimensionMismatch("Matrix A must be square")) end
m = side == 'L' ? k : size(B,1)
@@ -2073,7 +2097,7 @@ end
function symm(
side::Char, uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, B::StridedCuMatrix{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return symm!(side, uplo, alpha, A, B, CuRef{T}(zero(T)), similar(B))
end
function symm(side::Char, uplo::Char, alpha::Number,
@@ -2093,10 +2117,11 @@ for (fname, fname_64, elty) in ((:cublasDsyrk_v2, :cublasDsyrk_v2_64, :Float64),
@eval begin
function syrk!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2146,11 +2171,12 @@ for (fname, fname_64, elty) in ((:cublasDsyrkx, :cublasDsyrkx_64, :Float64),
@eval begin
function syrkx!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2186,7 +2212,7 @@ end
function syrkx(
uplo::Char, trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuVecOrMat{T},
beta::CuRefArray{T, CuVector{T}}, B::StridedCuVecOrMat{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return syrkx!(uplo, trans, alpha, A, B, beta, similar(A, (n, n)))
end
@@ -2205,11 +2231,12 @@ for (fname, fname_64, elty) in ((:cublasZhemm_v2, :cublasZhemm_v2_64, :ComplexF6
@eval begin
function hemm!(side::Char,
uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mA, nA = size(A)
m, n = size(B)
mC, nC = size(C)
@@ -2247,7 +2274,7 @@ end
function hemm(
uplo::Char, trans::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, B::StridedCuMatrix{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
m, n = size(B)
return hemm!(uplo, trans, alpha, A, B, CuRef{T}(zero(T)), similar(B, (m, n)))
end
@@ -2268,10 +2295,11 @@ for (fname, fname_64, elty, relty) in (
@eval begin
function herk!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$relty, CuVector{$relty, M}},
+ alpha::CuRefArray{$relty, CuVector{$relty, M}},
A::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$relty, CuVector{$relty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$relty, CuVector{$relty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2305,7 +2333,7 @@ for (fname, fname_64, elty, relty) in (
trans::Char,
alpha::CuRefArray{$relty, CuVector{$relty, M}},
A::StridedCuVecOrMat{$elty}
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return herk!(uplo, trans, alpha, A, CuRef{$relty}(zero($relty)), similar(A, (n, n)))
end
@@ -2327,11 +2355,12 @@ for (fname, fname_64, elty) in ((:cublasDsyr2k_v2, :cublasDsyr2k_v2_64, :Float64
@eval begin
function syr2k!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
# TODO: check size of B in julia (syr2k!)
m, n = size(C)
if m != n throw(DimensionMismatch("C must be square")) end
@@ -2386,7 +2415,7 @@ function syr2k(uplo::Char,
B::StridedCuVecOrMat)
T = eltype(A)
n = size(A, trans == 'N' ? 1 : 2)
- syr2k!(uplo, trans, convert(T, alpha), A, B, zero(T), similar(A, T, (n, n)))
+ return syr2k!(uplo, trans, convert(T, alpha), A, B, zero(T), similar(A, T, (n, n)))
end
function syr2k(uplo::Char, trans::Char, A::StridedCuVecOrMat, B::StridedCuVecOrMat)
syr2k(uplo, trans, one(eltype(A)), A, B)
@@ -2400,11 +2429,12 @@ for (fname, fname_64, elty, relty) in (
@eval begin
function her2k!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$relty, CuVector{$relty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$relty, CuVector{$relty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
# TODO: check size of B in julia (her2k!)
m, n = size(C)
if m != n throw(DimensionMismatch("C must be square")) end
@@ -2447,7 +2477,7 @@ for (fname, fname_64, elty, relty) in (
alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return her2k!(uplo, trans, alpha, A, B, CuRef{$relty}(zero($relty)), similar(A, (n, n)))
end
@@ -2477,10 +2507,11 @@ for (mmname, smname, elty) in
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(B)
mA, nA = size(A)
# TODO: clean up error messages
@@ -2499,9 +2530,10 @@ for (mmname, smname, elty) in
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
- B::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ B::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(B)
mA, nA = size(A)
# TODO: clean up error messages
@@ -2564,9 +2596,10 @@ for (fname, fname_64, elty) in ((:cublasDtrsmBatched, :cublasDtrsmBatched_64, :F
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::Vector{<:StridedCuMatrix{$elty}},
- B::Vector{<:StridedCuMatrix{$elty}}) where {M<:AbstractMemory}
+ B::Vector{<:StridedCuMatrix{$elty}}
+ ) where {M <: AbstractMemory}
if length(A) != length(B)
throw(DimensionMismatch(""))
end
@@ -2620,11 +2653,12 @@ for (fname, fname_64, elty) in ((:cublasDgeam, :cublasDgeam_64, :Float64),
@eval begin
function geam!(transa::Char,
transb::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
B::StridedCuMatrix{$elty},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mA, nA = size(A)
mB, nB = size(B)
m, n = size(C)
@@ -2860,8 +2894,9 @@ for (fname, elty) in ((:cublasDgetriBatched, :Float64),
end
function getri_batched!(n, Aptrs::CuVector{CuPtr{$elty}},
- lda, Cptrs::CuVector{CuPtr{$elty}},ldc,
- pivotArray::CuArray{Cint})
+ lda, Cptrs::CuVector{CuPtr{$elty}}, ldc,
+ pivotArray::CuArray{Cint}
+ )
batchSize = length(Aptrs)
info = CuArray{Cint}(undef, batchSize)
$fname(handle(), n, Aptrs, lda, pivotArray, Cptrs, ldc, info, batchSize) |
CI failures seem relevant. Feel free to ignore the formatter; I made it less spammy 😉 |
fd59678
to
a2dedad
Compare
I really do not know what is up with the 1.11 failure, it looks |
Rebase to get rid of CI failures? |
Yep, next on my to do list
…On Sat, Jan 25, 2025 at 2:43 AM Tim Besard ***@***.***> wrote:
Rebase to get rid of CI failures?
—
Reply to this email directly, view it on GitHub
<#2616 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJY6QWHTIRCVHYN4CSE32MM6DTAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJTHAZDQNBZGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
cc7e01f
to
804a967
Compare
804a967
to
bcd41c1
Compare
Attempting to address #2571
I've set the pointer mode to "device side" during handle creation. Since
gemmGroupedBatched
doesn't support device side pointer mode, it won't be usable. One workaround for this would be to add a new function to create a handle with host side mode, or add the pointer mode as an optional kwarg tohandle()
. Very open to feedback on this.I've set this up so that users can supply
CuRef
s of the appropriate result type to the level 1 functions for results. If that's not provided, the functions execute as they do today (synchronously). Similarly, for functions takingalpha
orbeta
scalar arguments, if the user providesCuRef
(actually aCuRefArray
), the functions will execute asynchronously and return instantly. If the user provides aNumber
, the behaviour is unchanged from today. I'm not married to this design and it can certainly be changed.cc @Jutho